Georgia’s Financial Contributions to 2016 Presidential Campaigns by Michael Harkin ========================================================
The dataset I’m going to explore in this project comes from the FEC (Federal Elections Commission). It contains data about individual financial contributions from Georgia citizens to the 2016 United States presidential campaigns. I live in Atlanta, generally considered a blue region in a predominantly red state. Georgia went to Donald Trump in the 2016 election. I am curious what Georgia’s campaign contributions looked like in the most recent election. I am also wondering what Atlanta’s contributions looked like compared to the rest of the state.
First I’ll load in the data and take a look at the accompanying documentation for 2016 datasets from the FEC website.
Now that the data is loaded in, I’ll take a look at the names of the 18 variables in the dataset alongside the aforementioned documentation.
## [1] "cmte_id" "cand_id" "cand_nm"
## [4] "contbr_nm" "contbr_city" "contbr_st"
## [7] "contbr_zip" "contbr_employer" "contbr_occupation"
## [10] "contb_receipt_amt" "contb_receipt_dt" "receipt_desc"
## [13] "memo_cd" "memo_text" "form_tp"
## [16] "file_num" "tran_id" "election_tp"
I’ll also look at a couple of quick summaries of each column in the data, as well as look at the data’s structure.
## cmte_id cand_id cand_nm
## C00575795:68387 P00003392:68387 Clinton, Hillary Rodham :68387
## C00580100:29690 P80001571:29690 Trump, Donald J. :29690
## C00577130:26085 P60007168:26085 Sanders, Bernard :26085
## C00574624:14818 P60006111:14818 Cruz, Rafael Edward 'Ted':14818
## C00573519: 7151 P60005915: 7151 Carson, Benjamin S. : 7151
## C00458844: 3292 P60006723: 3292 Rubio, Marco : 3292
## (Other) : 4562 (Other) : 4562 (Other) : 4562
## contbr_nm contbr_city contbr_st contbr_zip
## RALSTON, JULIE : 323 ATLANTA :33074 GA:153985 30041 : 495
## TATE, JOE : 224 MARIETTA: 7446 30075 : 460
## GLADIN, JANICE : 194 DECATUR : 6457 30004 : 427
## FELMAN, SHOSHANA: 169 SAVANNAH: 4444 30062 : 415
## ANSLEY, FAYNE : 159 ROSWELL : 4050 30327 : 404
## FARR, HOLLY : 154 (Other) :98513 (Other):151781
## (Other) :152762 NA's : 1 NA's : 3
## contbr_employer contbr_occupation
## RETIRED :26920 RETIRED :39642
## N/A :18429 NOT EMPLOYED : 7973
## SELF-EMPLOYED :10277 INFORMATION REQUESTED: 7681
## INFORMATION REQUESTED: 7729 ATTORNEY : 4034
## NONE : 5547 PHYSICIAN : 2876
## (Other) :84004 (Other) :90703
## NA's : 1079 NA's : 1076
## contb_receipt_amt contb_receipt_dt
## Min. :-5900.0 12-JUL-16: 2303
## 1st Qu.: 19.0 11-JUL-16: 2151
## Median : 35.0 06-JUL-16: 1630
## Mean : 114.3 09-AUG-16: 1316
## 3rd Qu.: 100.0 29-FEB-16: 1272
## Max. :12500.0 12-AUG-16: 1260
## (Other) :144053
## receipt_desc memo_cd
## Refund : 1047 X : 37679
## REDESIGNATION TO GENERAL : 233 NA's:116306
## REDESIGNATION FROM PRIMARY : 230
## REATTRIBUTION / REDESIGNATION REQUESTED: 74
## REATTRIBUTION FROM SPOUSE : 73
## (Other) : 276
## NA's :152052
## memo_text form_tp
## * EARMARKED CONTRIBUTION: SEE BELOW: 25549 SA17A:116284
## * HILLARY VICTORY FUND : 12383 SA18 : 36654
## REDESIGNATION TO GENERAL : 233 SB28A: 1047
## REDESIGNATION FROM PRIMARY : 230
## EARMARKED FROM MAKE DC LISTEN : 227
## (Other) : 841
## NA's :114522
## file_num tran_id election_tp
## Min. :1003942 SA17.891563: 3 G2016:58673
## 1st Qu.:1077916 C10144015 : 2 O2016: 23
## Median :1104813 C1015402 : 2 P2016:94829
## Mean :1103086 C1015604 : 2 P2020: 1
## 3rd Qu.:1133832 C1015637 : 2 NA's : 459
## Max. :1146285 C10163714 : 2
## (Other) :153972
## 'data.frame': 153985 obs. of 18 variables:
## $ cmte_id : Factor w/ 23 levels "C00458844","C00500587",..: 14 14 6 7 7 7 7 14 14 14 ...
## $ cand_id : Factor w/ 23 levels "P00003392","P20002671",..: 22 22 1 12 12 12 12 22 22 22 ...
## $ cand_nm : Factor w/ 23 levels "Bush, Jeb","Carson, Benjamin S.",..: 21 21 4 18 18 18 18 21 21 21 ...
## $ contbr_nm : Factor w/ 42078 levels "'CALLEN, MATTHEW",..: 32327 33953 32999 23056 23056 23056 14209 33956 35731 35735 ...
## $ contbr_city : Factor w/ 936 levels " FORT BENNING",..: 439 931 850 47 47 47 735 507 434 220 ...
## $ contbr_st : Factor w/ 1 level "GA": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : Factor w/ 18165 levels "00039","00040",..: 5955 15873 4328 9226 9226 9226 16387 5574 15631 13290 ...
## $ contbr_employer : Factor w/ 12665 levels "--NONE--","-SELECT ONE -",..: 5906 9400 5906 8071 8071 8071 2529 12505 9400 9400 ...
## $ contbr_occupation: Factor w/ 6490 levels " EDUCATOR"," LIBRARIAN",..: 2777 4908 2777 3732 3732 3732 4434 2875 4908 4908 ...
## $ contb_receipt_amt: num 99.8 69.1 100 7 3 ...
## $ contb_receipt_dt : Factor w/ 676 levels "01-APR-15","01-APR-16",..: 480 503 110 78 100 121 121 665 190 509 ...
## $ receipt_desc : Factor w/ 26 levels "* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: NA NA NA NA NA NA NA NA NA NA ...
## $ memo_cd : Factor w/ 1 level "X": 1 1 1 NA NA NA NA 1 1 1 ...
## $ memo_text : Factor w/ 114 levels "*","* EARMARKED CONTRIBUTION: SEE BELOW",..: NA NA 9 2 2 2 2 NA NA NA ...
## $ form_tp : Factor w/ 3 levels "SA17A","SA18",..: 2 2 2 1 1 1 1 2 2 2 ...
## $ file_num : int 1146165 1146165 1091718 1077404 1077404 1077404 1077404 1146165 1146165 1146165 ...
## $ tran_id : Factor w/ 153625 levels "A0007E27F70A14A83828",..: 103200 108291 47105 137106 137273 137289 137516 109370 113478 115570 ...
## $ election_tp : Factor w/ 4 levels "G2016","O2016",..: 1 1 3 3 3 3 3 1 1 1 ...
Looking over these summaries, here are a few things I immediately wonder about:
For ease of analysis later on, I’m going to convert the contb_receipt_dt column to R’s built-in Date format.
As a starting point I’ll take a look at a table containing the names of each candidate and the number of contributions they received, keeping in mind that each of these totals include refunds.
##
## Bush, Jeb Carson, Benjamin S.
## 922 7151
## Christie, Christopher J. Clinton, Hillary Rodham
## 45 68387
## Cruz, Rafael Edward 'Ted' Fiorina, Carly
## 14818 929
## Graham, Lindsey O. Huckabee, Mike
## 75 291
## Jindal, Bobby Johnson, Gary
## 12 451
## Kasich, John R. Lessig, Lawrence
## 599 16
## McMullin, Evan O'Malley, Martin Joseph
## 41 17
## Paul, Rand Perry, James R. (Rick)
## 750 27
## Rubio, Marco Sanders, Bernard
## 3292 26085
## Santorum, Richard J. Stein, Jill
## 14 157
## Trump, Donald J. Walker, Scott
## 29690 202
## Webb, James Henry Jr.
## 14
Unsurprisingly Trump and Hillary Clinton had the two highest counts of contributions. I am surprised, however, that Bernie Sanders received only about 3,500 less contributions in Georgia than Trump, the eventual winner. This conflicts with my prior assumptions about how red-leaning Georgia is.
In addition to the number of contributions associated with each candidate, it’d be helpful to know the total amount of money received by each candidate.
## ga$cand_nm: Bush, Jeb
## [1] 796260
## --------------------------------------------------------
## ga$cand_nm: Carson, Benjamin S.
## [1] 842664.2
## --------------------------------------------------------
## ga$cand_nm: Christie, Christopher J.
## [1] 34478
## --------------------------------------------------------
## ga$cand_nm: Clinton, Hillary Rodham
## [1] 7045095
## --------------------------------------------------------
## ga$cand_nm: Cruz, Rafael Edward 'Ted'
## [1] 1193164
## --------------------------------------------------------
## ga$cand_nm: Fiorina, Carly
## [1] 167382
## --------------------------------------------------------
## ga$cand_nm: Graham, Lindsey O.
## [1] 64273.62
## --------------------------------------------------------
## ga$cand_nm: Huckabee, Mike
## [1] 58327.5
## --------------------------------------------------------
## ga$cand_nm: Jindal, Bobby
## [1] 3600
## --------------------------------------------------------
## ga$cand_nm: Johnson, Gary
## [1] 101033.9
## --------------------------------------------------------
## ga$cand_nm: Kasich, John R.
## [1] 184881.5
## --------------------------------------------------------
## ga$cand_nm: Lessig, Lawrence
## [1] 3618.38
## --------------------------------------------------------
## ga$cand_nm: McMullin, Evan
## [1] 5987
## --------------------------------------------------------
## ga$cand_nm: O'Malley, Martin Joseph
## [1] 19050
## --------------------------------------------------------
## ga$cand_nm: Paul, Rand
## [1] 121643.6
## --------------------------------------------------------
## ga$cand_nm: Perry, James R. (Rick)
## [1] 12870
## --------------------------------------------------------
## ga$cand_nm: Rubio, Marco
## [1] 777207.9
## --------------------------------------------------------
## ga$cand_nm: Sanders, Bernard
## [1] 1075672
## --------------------------------------------------------
## ga$cand_nm: Santorum, Richard J.
## [1] 8507.79
## --------------------------------------------------------
## ga$cand_nm: Stein, Jill
## [1] 34510
## --------------------------------------------------------
## ga$cand_nm: Trump, Donald J.
## [1] 4935612
## --------------------------------------------------------
## ga$cand_nm: Walker, Scott
## [1] 114254
## --------------------------------------------------------
## ga$cand_nm: Webb, James Henry Jr.
## [1] 3800
The highest totals belong to Clinton (7,045,095), Trump (4,935,612), Ted Cruz (1,193,164), Sanders (1,075,672), and Ben Carson (842664.2). These totals include both primary and general election contributions.
I’d like to add the candidates’ gender and party to the contribution records. This additional information would potentially make for some interesting analyses. In order to do this, I’ll write out character vectors of candidates within each political party based on information I collected from the FEC.
I’ll use nested ifelse statements to assign each contribution to the appropriate political party. I’ll also do the same kind of categorization using character vectors to match the candidate’s gender to each contribution.
Following the format of variables as they came titled, I’ll name these new columns cand_party (candidate party) and cand_gender (candidate gender).
I’ll take a quick look at summaries and plots of these new variables.
##
## female male
## 69473 84512
##
## Democrat Green Independent Libertarian Republican
## 94519 157 41 451 58817
The Democratic candidates received over 35,000 more contributions than the Republican candidates, despite there being 15 Republican candidates during the primary elections. Additionally, there are only 15,000 more contributions associated with male candidates than female candidates, even though there were 20 male candidates and 3 female candidates.
The third-party candidates are barely perceptible in this bar graph, so I’ll apply a log10 scale to the y-axis in order to get a better sense of the distribution.
Looking at this data on a log10 y-scale makes it more apparent that the amount of contributions to third-party candidates is not insignificant. Gary Johnson, the Libertarian candidate, received 451 contributions and Jill Stein, the Green candidate, received 157 contributions. The bar chart without the log10 transformation makes it look like the Independent candidate, Evan McMullin, has 0 contributions, whereas he actually has 41.
I’d like to get a better picture of how contributions to the individual candidates break down. First I’ll take a look at a bar chart of the Democratic candidates. I’ll apply a log10 transformation to the y-axis in order to be able to properly view contributions to all the candidates. I’ll also swap the x-axis and y-axis for ease of reading.
Clinton, the Democratic nominee, received considerably more donations in Georgia than any other Democratic candidate, getting around 40,000 more contributions than Sanders, the runner-up.
The other Democratic candidates received very few contributions. Here are the contribution counts for the other three Democrats:
Now I’ll turn to the Republicans. I’ll do the same log10 transformation to the y-axis in order to be able to see all the contributions properly, and will swap the x- and y-axis for ease of reading.
There’s significant variance in the amount of contributions received by the Republican candidates. The four highest bars represent the number of contributions received by Trump, Ted Cruz, Ben Carson, and Marco Rubio.
As for the lower-ranking candidates, here are their contribution counts:
I’d also like to look at the candidates outside of the two major political parties.
These three candidates received much fewer contributions than the top Democratic and Republican candidates did. Gary Johnson, the Libertarian candidate and the recipient of the highest number of contributions among the third-party candidates, has just over 450 contributions associated with his campaign.
I also need to take a look at contribution receipt amounts. First I will run a quick summary of the contb_receipt_amt variable, which represents the receipt amount for each contribution, in order to get a sense of the range of values of contributions received.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 19.0 35.0 114.3 100.0 12500.0
There are at least a few negative values in the dataset, and there appear to be both positive and negative outliers. The negative values represent refunds to contributors. According to the summary of the dataset’s receipt_desc variable, there are 1,047 rows of the dataset that are marked as “Refund.” I’ll keep this in mind as I continue exploring the data.
According to FEC rules, an individual maximum of 2,700 dollars can be contributed to a primary or general presidential election. It’s legal to contribute as much as 5,400 dollars to a campaign if half the amount is designated for the primary election and the other half is designated to the general election. For this reason I’m curious about the significant outliers in this data, particularly the low of -5,900 dollars and the high of 12,500 dollars.
While looking at the overall distribution of contributions, it’d be helpful to look at a histogram of the range of receipt amounts.
Most contributions to the various presidential campaigns appear to be less than 500 dollars per this initial qplot graph. There are some real outliers in the data as well, particularly a handful of very low negative values and some very high outliers as well.
I’m going to take a closer look at the tallest chunk of the distribution, adjusting the binwidth and the axes to get a sense of the distribution of contributions under 500 dollars.
The data remains positively skewed even zoomed in this far. I can now see that most contributions are 100 dollars or less. I’m going to apply a transformation to the data using scale_x_sqrt to see if that’ll give a helpful view of the distribution’s shape. I’ll also reduce the binsize to 1.
The square root transformation results in a simlarly positively skewed distribution. I’ll try a log10 transformation of the data to see if that’s helpful.
A log10 transformation gives an approximately normal distribution of the data for contribution amount. However, because it’s a log transformation, it doesn’t incorporate the negative values. Same goes for the square root transformation. That said, the refunds make up only a small portion of the overall data (less than 1 percent).
I’d like to zoom in on the lower end of the distribution (minus the log10 transformation) to get a better idea of how much there were of certain contribution amounts.
Many of these contributions are below 50 dollars, so I’d like to zoom in a bit more on the part of the histogram between 0 and 50 dollars.
Most of the contributions are multiples of 5, with the highest number of contributions coming in at 25 dollars. There are a high number of 5-, 10-, 50-, and 100-dollar contributions as well.
Another potentially interesting component of the data are the dates on which contributions were received. Earlier I converted the contb_receipt_dt column to the R date format, which should help with displaying and analyzing the data correctly.
First I’ll do a histogram of receipt dates for every contribution in Georgia.
There’s a surprisingly significant amount of contributions in the year 2015, and even a few in 2014. The contributions ramp up quite a bit in 2016. It makes sense that the majority of contributions would come in during 2016 given that presidential campaigns don’t seem to ramp up until the year of the election itself.
I’m curious what this distribution looks like with the binwidth set to 1 and the x-axis confined to 2015 through 2016.
This results in a much more granular histogram with distinct peaks around March 2016, July 2016, and just before Election Day in November 2016. In the multivariate analysis I want to be sure to examine whether the number of contributions and their amounts have any relationship with the primary and general elections.
There are three form types associated with each contribution in the dataset. These categorizations refer to lines in Form 3P, an FEC document that requires candidates to report receipts and disbursements. Here are the descriptions of each line:
Form 3P Line 17A: Contributions (other than loans) from Individuals/Persons Other than Political Committee
Form 3P Line 18: Transfers from Other Authorized Committees
Form 3P Line 28A: Refunds of Contributions to Individuals/Persons Other Than Political Committees
Here’s a quick look at the amount of each form type that appears in the dataset for Georgia.
##
## SA17A SA18 SB28A
## 116284 36654 1047
The vast majority of contributions (116,284) are categorized as “SA17A”, meaning they are contributions from individuals. There are a further 36,654 that are contributions from committees. It’d be interesting in the bivariate section to see if these form types have any type of correspondence with donation amounts or contributor location.
I’m curious about the names listed as having hundreds of contributions associated with their name. The table listing out every contributor’s name is far too long to print out but below are a few contributors who have many transactions associated with their name.
## RALSTON, JULIE TATE, JOE GLADIN, JANICE FELMAN, SHOSHANA
## 323 224 194 169
## ANSLEY, FAYNE FARR, HOLLY
## 159 154
I’ll look at the sum of the top three of these folks’ contributions.
Julie Ralston:
## [1] 415
Joe Tate:
## [1] 1446.57
Janice Gladin:
## [1] 2197.71
None of these three contributors pass the 2,700-dollar mark. I’m wondering if there was perhaps a program that allowed people to contribute a certain amount of money per week or per month on a recurring basis. That might help explain the recurring payments in cases like this.
In the bivariate plots section, I want to look at individual contributor totals if possible.
One of the dataset’s columns is named memo_cd, and it indicates whether or not there is “memo text”" associated with the contribution. If an ‘X’ appears in this column then there is an explanatory memo of some kind associated with that contribution.
##
## X
## 37679
Almost 38,000 of the contributions have “memo text” associated with them, and there are over 100 distinct memos. Here’s a sample of the memos that appear.
##
## *
## 3
## * EARMARKED CONTRIBUTION: SEE BELOW
## 25549
## * EARMARKED CONTRIBUTION: SEE BELOW $2700 TO BE RE-ATTRIBUTED TO SPOUSE
## 1
## * EARMARKED CONTRIBUTION: SEE BELOW 616993441
## 1
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING
## 10
## * EARMARKED CONTRIBUTION: SEE BELOW REFUNDED ON 10/14/2016
## 1
Here are a few of the notable categories from the extended list:
The election_tp column specifies which election each contribution is connected to. The vast majority of contributions will likely be associated with the 2016 election but I’d like to see what the data says.
##
## G2016 O2016 P2016 P2020
## 58673 23 94829 1
Nearly 95,000 the contributions are associated with primaries (P2016) and a further 59,000 are associated with the general election (G2016). There are 23 labeled as “other” (O2016) - it’d be interesting to take a look at those - and one contribution labeled as for the 2020 primaries (P2020).
I’ll look at counts of election type and use a log10 scale on the y-axis in order to properly see the counts.
There are a significant amount of ‘NA’ values for election type. I’ll look at these contributions shortly.
First I’ll look at the contributions marked as “O2016” (2016 “Other”) and the contribution marked as “P2020” (for the 2020 primaries).
2016 “Other”:
## cmte_id cand_id cand_nm contbr_nm
## 94443 C00581199 P20003984 Stein, Jill EMRICH, MICHELLE
## 94446 C00581199 P20003984 Stein, Jill MENTZER, MARYKAY
## 94450 C00581199 P20003984 Stein, Jill WRIGHT, CYNTHIA
## 94451 C00581199 P20003984 Stein, Jill YALCINKAYA, YASEMIN
## 94497 C00581199 P20003984 Stein, Jill ANANIA, FRANK
## 94549 C00581199 P20003984 Stein, Jill ARLEDGE, JOHN
## contbr_city contbr_st contbr_zip contbr_employer
## 94443 ATLANTA GA 30305 N/A
## 94446 DECATUR GA 30033 NFP
## 94450 PEACHTREE CORNERS GA 30092 SELF-EMPLOYED
## 94451 ATLANTA GA 30342 SELF-EMPLOYED
## 94497 ATLANTA GA 30324 EMORY UNIVERSITY
## 94549 CARROLLTON GA 30116 WEST GEORGIA GASTROENTEROLOGY
## contbr_occupation contb_receipt_amt
## 94443 PHYSICIAN 250
## 94446 CONSULTANT 500
## 94450 REAL ESTATE 500
## 94451 CO-OWNER OF A TREE REMOVAL COMPANY 500
## 94497 PHYSICIAN-SCIENTIST 500
## 94549 DOCTOR 1000
## contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num
## 94443 2016-11-24 <NA> <NA> <NA> SA17A 1134336
## 94446 2016-11-24 <NA> <NA> <NA> SA17A 1134336
## 94450 2016-11-24 <NA> <NA> <NA> SA17A 1134336
## 94451 2016-11-24 <NA> <NA> <NA> SA17A 1134336
## 94497 2016-11-23 <NA> <NA> <NA> SA17A 1134336
## 94549 2016-11-24 <NA> <NA> <NA> SA17A 1134336
## tran_id election_tp cand_party cand_gender
## 94443 SA17A.236931 O2016 Green female
## 94446 SA17A.238169 O2016 Green female
## 94450 SA17A.239535 O2016 Green female
## 94451 SA17A.239547 O2016 Green female
## 94497 SA17A.236135 O2016 Green female
## 94549 SA17A.236165 O2016 Green female
2020 Primaries:
## cmte_id cand_id cand_nm contbr_nm contbr_city
## 70339 C00578757 P60007697 Graham, Lindsey O. BELL, THOMAS D. ATLANTA
## contbr_st contbr_zip contbr_employer contbr_occupation
## 70339 GA 303051116 MESA CAPITAL PARTNERS INVESTOR
## contb_receipt_amt contb_receipt_dt receipt_desc
## 70339 2600 2016-01-27 REDESIGNATION FROM GENERAL
## memo_cd memo_text form_tp file_num tran_id
## 70339 X REDESIGNATION FROM GENERAL SA17A 1051548 SA17.77506
## election_tp cand_party cand_gender
## 70339 P2020 Republican male
The contributions labeled “other” are all donations to Jill Stein, the Green nominee, made between November 23-28, 2016.
The donation labeled as for the 2020 primaries was made to Lindsey Graham on January 27, 2016 and is labeled as “Redesignation from General.”
Now I’d like to take a look at the rows where the election type is “NA.”
Number of rows with election type “NA”:
## [1] 459 20
It turns out that 459 of the transactions have their election type labeled as “NA.” I’m going to take a closer look at these contributions.
## cmte_id cand_id cand_nm
## C00580100:408 P80001571:408 Trump, Donald J. :408
## C00623884: 26 P60022654: 26 McMullin, Evan : 26
## C00574624: 11 P60006111: 11 Cruz, Rafael Edward 'Ted': 11
## C00581199: 7 P20003984: 7 Stein, Jill : 7
## C00573519: 5 P60005915: 5 Carson, Benjamin S. : 5
## C00577312: 1 P60007242: 1 Fiorina, Carly : 1
## (Other) : 1 (Other) : 1 (Other) : 1
## contbr_nm contbr_city contbr_st contbr_zip
## DOMINGUEZ, ALEJANDRO: 3 ATLANTA : 41 GA:459 30092 : 9
## HOLLOWAY, ANNA : 3 MARIETTA: 19 30327 : 9
## JONES, MILDRED : 3 SAVANNAH: 14 30338 : 8
## LEE, LAURA : 3 NORCROSS: 10 31411 : 8
## SMITH, LINDA J : 3 AUGUSTA : 9 30022 : 7
## CAWOOD, JOSEPH : 2 COLUMBUS: 9 30067 : 7
## (Other) :442 (Other) :357 (Other):411
## contbr_employer contbr_occupation
## RETIRED :134 RETIRED :134
## SELF-EMPLOYED : 71 INFORMATION REQUESTED: 61
## INFORMATION REQUESTED: 62 PHYSICIAN : 9
## HOMEMAKER : 8 SALES : 9
## CHICK-FIL-A I.N.C. : 3 ATTORNEY : 8
## (Other) :163 (Other) :220
## NA's : 18 NA's : 18
## contb_receipt_amt contb_receipt_dt
## Min. :-5375.0 Min. :2015-09-28
## 1st Qu.: 80.0 1st Qu.:2016-08-26
## Median : 240.0 Median :2016-09-06
## Mean : 284.9 Mean :2016-09-23
## 3rd Qu.: 400.0 3rd Qu.:2016-11-09
## Max. : 2700.0 Max. :2016-11-17
##
## receipt_desc
## Refund : 18
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING: 0
## * REATTRIBUTED FROM MARGARET BEAR : 0
## * REATTRIBUTED TO ANN EDMUNDSON : 0
## * REATTRIBUTED TO ELIZABETH PARKER : 0
## (Other) : 0
## NA's :441
## memo_cd
## X :408
## NA's: 51
##
##
##
##
##
## memo_text
## * : 0
## * EARMARKED CONTRIBUTION: SEE BELOW : 0
## * EARMARKED CONTRIBUTION: SEE BELOW $2700 TO BE RE-ATTRIBUTED TO SPOUSE: 0
## * EARMARKED CONTRIBUTION: SEE BELOW 616993441 : 0
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING : 0
## (Other) : 0
## NA's :459
## form_tp file_num tran_id election_tp
## SA17A: 33 Min. :1029240 SA17A.11742: 1 G2016: 0
## SA18 :408 1st Qu.:1104813 SA17A.11901: 1 O2016: 0
## SB28A: 18 Median :1111847 SA17A.12497: 1 P2016: 0
## Mean :1117813 SA17A.13035: 1 P2020: 0
## 3rd Qu.:1133930 SA17A.13068: 1 NA's :459
## Max. :1133930 SA17A.13173: 1
## (Other) :453
## cand_party cand_gender
## Length:459 Length:459
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
The vast majority of the transactions with election type “NA” - 408 of them - are to Trump’s campaign. 408 of them have the form type “SA18” meaning they are contributions from authorized committees. Additionally, about one-third of the contributions are from people who are retired, and most of them are from parts of Georgia outside Atlanta.
I’ll filter the data.frame using dplyr to look more closely at contributions with election type “NA” and form type “SA18”.
## cmte_id cand_id cand_nm
## C00580100:408 P80001571:408 Trump, Donald J. :408
## C00458844: 0 P00003392: 0 Bush, Jeb : 0
## C00500587: 0 P20002671: 0 Carson, Benjamin S. : 0
## C00573519: 0 P20002721: 0 Christie, Christopher J. : 0
## C00574624: 0 P20003281: 0 Clinton, Hillary Rodham : 0
## C00575449: 0 P20003984: 0 Cruz, Rafael Edward 'Ted': 0
## (Other) : 0 (Other) : 0 (Other) : 0
## contbr_nm contbr_city contbr_st contbr_zip
## DOMINGUEZ, ALEJANDRO : 3 ATLANTA : 34 GA:408 30092 : 9
## SMITH, LINDA J : 3 MARIETTA: 17 30327 : 8
## CROSS, VIRGINIA R. : 2 SAVANNAH: 12 30338 : 8
## MADDOX, JAMES : 2 COLUMBUS: 9 30127 : 7
## MALLORY, PAGE MUNFORD: 2 NORCROSS: 9 30022 : 6
## MASON, JOHN T. : 2 AUGUSTA : 8 30062 : 6
## (Other) :394 (Other) :319 (Other):364
## contbr_employer contbr_occupation
## RETIRED :125 RETIRED :125
## SELF-EMPLOYED : 69 INFORMATION REQUESTED: 61
## INFORMATION REQUESTED : 62 PHYSICIAN : 9
## HOMEMAKER : 8 SALES : 9
## CHICK-FIL-A I.N.C. : 3 ATTORNEY : 8
## ARTIC CONCRETE CONTRACTORS LLC: 2 HOMEMAKER : 8
## (Other) :139 (Other) :188
## contb_receipt_amt contb_receipt_dt
## Min. : 80 Min. :2016-08-16
## 1st Qu.: 80 1st Qu.:2016-08-29
## Median : 240 Median :2016-09-06
## Mean : 351 Mean :2016-10-03
## 3rd Qu.: 400 3rd Qu.:2016-11-09
## Max. :2700 Max. :2016-11-17
##
## receipt_desc
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING: 0
## * REATTRIBUTED FROM MARGARET BEAR : 0
## * REATTRIBUTED TO ANN EDMUNDSON : 0
## * REATTRIBUTED TO ELIZABETH PARKER : 0
## REATTRIBUTION / REDESIGNATION REQUESTED : 0
## (Other) : 0
## NA's :408
## memo_cd
## X:408
##
##
##
##
##
##
## memo_text
## * : 0
## * EARMARKED CONTRIBUTION: SEE BELOW : 0
## * EARMARKED CONTRIBUTION: SEE BELOW $2700 TO BE RE-ATTRIBUTED TO SPOUSE: 0
## * EARMARKED CONTRIBUTION: SEE BELOW 616993441 : 0
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING : 0
## (Other) : 0
## NA's :408
## form_tp file_num tran_id election_tp
## SA17A: 0 Min. :1104813 SA18.2447514: 1 G2016: 0
## SA18 :408 1st Qu.:1104813 SA18.2447523: 1 O2016: 0
## SB28A: 0 Median :1104813 SA18.2447556: 1 P2016: 0
## Mean :1119229 SA18.2447634: 1 P2020: 0
## 3rd Qu.:1133930 SA18.2447641: 1 NA's :408
## Max. :1133930 SA18.2447650: 1
## (Other) :402
## cand_party cand_gender
## Length:408 Length:408
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
All of the “NA” contributions labeled as transfers from authorized committees are contributions to Donald Trump that came in between August 16 and November 17, 2016. Almost half of these contributors are either retired, self-employed, or homemakers, and most of them live outside of Atlanta.
There are 153,985 observations of 18 variables. Each row in the dataset represents a contribution, refund, or redesignation from Georgia individuals to the 2016 presidential campaigns. This data is drawn from reports from each campaign as required by the Federal Election Commission (FEC).
The individual transactions have several columns’ worth of information. This information includes the amount and date of each transaction as well as the candidate it was made in support of, whether the transaction was for the primary or general election, and demographic information about the contributor. In cases where an individual had multiple transactions associated with their name, each individual transaction is recorded, as opposed to being consolidated into one row per contributor.
The main features of interest to me are the contribution amounts, the dates and designations of each contribution, and the demographic information about the contributors. The way this data is reported allows for several potential avenues of exploration. It gets a bit fuzzy with regard to the wide range of memos and designations that are used.
I think that the zip code, form type, and individual transaction IDs will help with reshaping and mutating the data in order to better understand patterns and trends in the contribution data. These parts of the data should be helpful with analyzing contributions by location.
I added two new variables: candidate party (cand_party) and gender (cand_gender). I used character vectors of the candidates’ names to do this.
There are some significant outliers in the distribution of contribution amounts. These outliers positively skew the contribution distributions. For instance, there’s a Republican contribution that’s 12,500 dollars, as well as one refund to a Republican contributor in the amount of -5,900 dollars.
I transformed the contribution amount data using a log10 scale. I did this because I wanted to determine whether the distribution was approximately normal, and it appeared to be positively skewed prior to transformation. The log10 transformation resulted in normal-ish distributions for all of the candidates I looked at who received at least a couple hundred contributions.
There were a few bar charts with which I applied a log10 transformation to the y-axis. This allowed me to get a better sense of the distribution in cases where the counts for individual variables were hard to perceive or seemed to be zero.
I also changed the contribution date column to R’s Date format in order to be able to work with that field in a more convenient way that accurately reflects that component of the data set.
I’d be interested in seeing what the individual parties’ contributions look like, particularly the Republicans and Democrats given that there is a decent sample size of contributions for each party.
I’ll start by using the by function to get a summary of contribution amounts divided up by party.
## ga$cand_party: Democrat
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 10.0 25.0 86.2 50.0 5400.0
## --------------------------------------------------------
## ga$cand_party: Green
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -965.0 49.0 100.0 219.8 250.0 2700.0
## --------------------------------------------------------
## ga$cand_party: Independent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5 25 100 146 250 500
## --------------------------------------------------------
## ga$cand_party: Libertarian
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 30 100 224 250 2700
## --------------------------------------------------------
## ga$cand_party: Republican
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 25.0 50.0 158.4 100.0 12500.0
The Democrats and Republicans have the lowest median contribution amounts at 25 dollars and 50 dollars respectively. Republicans, however, have a much higher mean contribution of 158 dollars compared to a mean contribution of 86 dollars to the Democratic candidates. It seems notable that the Republican candidates’ mean is that much higher given the high number of Republican candidates who ran and are represented in the dataset.
Additionally, the median and mean contribution amounts for candidates outside of the two major political parties were considerably higher than the mean and median contributions to Democratic candidates. This is interesting for a couple of reasons. Perhaps some contributors to third-party candidates and nominees pledge more money because they’re aware that there are fewer third-party campaign contributors than there are contributors to Democratic and Republican candidates and nominees.
A frequency polygon might be a useful way of looking at the distribution of contribution amounts across the different U.S. political parties.
Every perceptible peak in this frequency polygon represents either Democratic or Republican candidates. Let’s zoom in to the area between 0 and 250 dollars, and then further in at the area between 0 and 100 dollars, just to get a sense of the distribution of the lower contribution amounts that comprise the bulk of contributions received in this election.
It looks like the Democratic candidates had generally much higher levels of giving at amounts of 100 dollars or less. There’s a spike at the 40-dollar and 80-dollar levels for Republican candidates, as well as spikes at the 200- and 250-dollar levels. However, nearly every other spike for Republican candidates is dwarfed by a larger Democratic candidates spike above it. The Independent, Green, and Libertarian candidates barely register on this graph, appearing to be a single flat line just above a count of 0.
Now I’ll take a look at histograms of contributions for the various candidates. First I’ll look at the top two Democratic candidates.
Hillary Clinton:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 10 25 103 75 2700
Bernie Sanders:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2500.00 10.00 27.00 41.24 50.00 2700.00
There is very little difference in median contribution amount between Clinton’s and Sanders’ campaigns. The median contribution to Sanders’ campaign is 27 dollars versus the median of 25 dollars for Clinton. However, the mean contribution amount for Clinton is notably higher than that of Sanders: 103 dollars versus 41 dollars. The variance is much higher for Clinton’s contributions as well.
Next I’ll write a function for making a histogram of a candidate’s contributions and use that for making histograms for individual candidates. The first two candidates I’ll use this function (make_contb_hist) for Clinton and Sanders, the two leading Democratic candidates in the election.
Both of these histograms are positively skewed and reflect what we’re seeing in the summaries. Considering that most contributions are 250 dollars or less each, I’d like to zoom in a bit to get a better sense of the shape of these distributions.
As is the case for the overall data for receipt amounts, most of the contributions to both Democratic candidates were 50 dollars or less. This seems to make sense given that most contributors probably don’t put aside very much of their disposable income for presidential campaign contributions.
It’s notable that the maximum refund and maximum contribution amounts to Clinton’s campaign are -2,700 and 2,700 dollars respectively. The maximum contribution to Sanders’ campaign is also 2,700 dollars, with the maximum refund at the -2,500 mark. There aren’t any outliers in this dataset that immediately suggest contributions above the legal amount.
I’d like to see what the distribution of contributions for both Democratic candidates looks like with a log10 transformation in light of this transformation having been helpful with transforming the overall contribution amount data.
The log10 transformation of these distributions appear to be approximately normal, with a slight positive skew to the contributions received by Clinton and a slight negative skew to contributions received by Sanders.
I’d like to see how the top Republican candidates compare in this regard.
Donald Trump:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.00 28.00 72.37 166.20 184.00 12500.00
Ted Cruz:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5375.00 25.00 50.00 80.52 100.00 5400.00
Ben Carson:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5000.0 25.0 50.0 117.8 100.0 10000.0
Jeb Bush:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3000.00 41.25 300.00 863.60 1700.00 5400.00
As with the Democratic candidates, most contributions are less than 250 each. I’d like to zoom in a bit and get a better sense of what the distribution of Republican candidate contributions are in amounts of 250 dollars or less.
It’s now clear that most of the contributions to the leading Republican campaigns were less than 100 dollars. Only Trump’s campaign has a significant spike at the 200-dollar level.
I’d like to see what the Republican contribution data looks like after a log10 transformation on the x-axis.
The log10-transformed distribution of contributions received by Bush is relatively uniform. Carson, Cruz, and Trump received significantly more contributions than Bush, and the log-transformed distribution of contributions for these three are normal-ish, particularly those for Cruz and Trump.
For the sake of completeness and comparison I’d like to look at the distribution of contributions to third-party candidates.
There are magnitudes fewer contributions to these third-party candidates but the data appears nonetheless long-tailed for Gary Johnson, the Libertarian nominee, and Jill Stein, the Green nominee. Based on what I looked at with the previous distributions, I’d like to see what a log10 transformation of the x-axis shows.
The contributions to Johnson and Stein have normal-ish distributions, while McMullin’s contribution distribution is comparatively uniform. McMullin only received 41 contributions in Georgia so the uniformity of the transformed distribution does not seem to convey any special meaning.
Taking a step back, it’d be handy to look at the two major-party nominees, Trump and Clinton, side by side in both summaries and histogram form.
Trump:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.00 28.00 72.37 166.20 184.00 12500.00
Clinton:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 10 25 103 75 2700
It’s the Trump contribution distribution that contains the extreme outliers of -5,900 dollars and 12,500 dollars. Presumably these amounts reflect contributions that were divided up between the two elections - primary and general - or contributions that were refunded at least in part in order to remain under the 2,700-dollar individual threshold.
Trump had higher mean and median contribution amounts, but Clinton had a higher number of contributions overall, as is evident in the histograms. Both distributions appear approximately normal after a log10 transformation of the x-axis.
I’d like to look at the dates that Clinton and Trump received contributions.
The shapes of these graphs are really interesting. Clinton’s contributions steadily climb for the most part, with a dip occuring after the month of August 2016. Trump’s contributions go from miniscule to quite a lot as July arrives, after which there’s a dip in October (not unlike Clinton’s dip in August) which precedes upticks of contributions right before and after the election.
There’s an uptick in July for both candidates, which may have something to do with the Democratic and Republican National Conventions taking place in the month of July. These conventions mark when these two candidates officially became the major party nominees.
I’d like to get a closer look at the last three months of 2016. The election occurred on November 8, 2016, but there are contributions marked as late as December 31, 2016. I’m going to zoom in on these last three months and adjust the binwidth to display individual days.
Contributions to both campaigns climbed and spiked in mid-October before dipping slightly. From there the contributions to Clinton’s campaign ramped up again, peaking in the days right before the election.
Trump continued to receive contributions after Election Day. These contributions steadily trickled in through early December.
Looking at a frequency polygon of this data might be useful.
These frequency polygons provide a nice chart of the last couple of months of the general election and how contributions to the two major party nominees unfolded during that time. Clinton’s fundraising in Georgia was considerably higher than that of Trump up until the moment of the election, after which Trump continued to receive contributions through the end of the calendar year.
Here’s a version of the figure above that encompasses the entire general election period after Clinton and Trump were formally nominated as their party’s nominees.
In the multivariate plots section, I’d like to look at contribution totals by date. This seems like it’d be helpful to get a more full picture of how campaign contributions changed over the course of the election.
I’d like to look at a scatterplot showing individual contribution amounts over the course of time. I’ll start by taking a look at a scatterplot of every single transaction, positive and negative, over the range of dates represented in the dataset.
There is some serious overplotting going on here, and outliers in multiple directions. I’d like to zoom in on the area between 0 and 2700 dollars (the upper limit on individual contributions per election), as well as the time period from early 2015 through the end of 2016, where the vast majority of contributions appear to have taken place. I’ll also use the alpha parameter for geom_point to add some transparency and hopefully reduce the effect of the overplotting.
Given my earlier explorations, it’s no surprise that most contributions are under 500 dollars apiece. I’d like to zoom in even more and increase the transparency to see if that tells us anything new.
Even with transparency reduced to 1/100, we’re still seeing massive overplotting between the amounts of 0 and 50 dollars. I’ll zoom in once more and introduce some noise to the plot using geom_jitter to see what that shows.
This gives a better picture of the frequency of particular contribution levels over time. Starting in late 2015, contributions at the 25- and 50-dollar levels ramp up at what appears to be an almost identical rate and do not let up until early November 2016. Most contribution amounts are multiples of 5. The most solid-looking (but not lengthiest) lines observable in the graph are at the 5- and 10-dollar levels. The lengthiest lines are at the 25- and 50-dollar levels.
If possible I’d like to take a stab at looking at total contributions from individual contributors in Georgia. In order to do this I’m going to use the group_by, select, and summarise functions from the dplyr package to try and boil down the data.frame to individual contributor names and their total contributions to different candidates (allowing for the possibility that they may have donated to more than one candidate).
Here are the first 20 rows of a data.frame in which I try to consolidate individual contributor totals into a row for each contributor and the total they gave to a particular candidate:
## Source: local data frame [20 x 7]
## Groups: contbr_nm, contbr_city, contbr_zip, contbr_employer, contbr_occupation [20]
##
## contbr_nm contbr_city contbr_zip
## <fctr> <fctr> <dbl>
## 1 'CALLEN, MATTHEW FAYETTEVILLE 30215
## 2 'CALLOWAY, MAXWELL ATLANTA 30355
## 3 'SHELLITO, ROBIN AUGUSTA 30907
## 4 21ST CENTURY MAJORITY FUND ATLANTA 30328
## 5 A CIPCIC JR, JOSEPH BLUE RIDGE 30513
## 6 AAGAARD, THOMAS A. ROSWELL 30075
## 7 AARON, BILLYE S. ATLANTA 30311
## 8 AARON, DAVID CEDARTOWN 30125
## 9 AARON, DAVID CEDARTOWN 30125
## 10 AARON, DIANNE S HULL 30646
## 11 AARON, HENRY ATLANTA 30311
## 12 AARON, SCOTT RINGGOLD 30736
## 13 AARONSON, ANDREW MARIETTA 30062
## 14 ABASS, AHMAD MARIETTA 30066
## 15 ABBISS, RICHARD MARIETTA 30068
## 16 ABBISS, RICHARD MARIETTA 30068
## 17 ABBITT, BILLIE ROSWELL 30076
## 18 ABBOT, JAMES ATLANTA 30307
## 19 ABBOTT, ANNA K MRS. JACKSON 30233
## 20 ABBOTT, DONA MRS. RABUN GAP 30568
## # ... with 4 more variables: contbr_employer <fctr>,
## # contbr_occupation <fctr>, cand_nm <fctr>, sumcontb <dbl>
There are a few problems that make it challenging to do a proper analysis of individual contribution totals. The primary issue is that it’s difficult to definitively consolidate individual contributors into rows that accurately reflect the total they’ve contributed to particular candidates.
The reason for this problem is not that the information is inaccurate but that contributors may have given different answers to the same demographic questions with different contributions they gave. For example, the name David Aaron appears in rows 8 and 9 of the data.frame excerpted above. Both rows have the same name, city, zip code, profession, and candidate (Trump) but different employers are specified. Is this one person who changed employers at some point during the election? Or do these rows represent a father and son with the same name and profession? It’s hard to say for sure.
I’m curious if there’s any relationship between party and form type.
## ga$cand_party: Democrat
## SA17A SA18 SB28A
## 81197 12407 915
## --------------------------------------------------------
## ga$cand_party: Green
## SA17A SA18 SB28A
## 156 0 1
## --------------------------------------------------------
## ga$cand_party: Independent
## SA17A SA18 SB28A
## 41 0 0
## --------------------------------------------------------
## ga$cand_party: Libertarian
## SA17A SA18 SB28A
## 451 0 0
## --------------------------------------------------------
## ga$cand_party: Republican
## SA17A SA18 SB28A
## 34439 24247 131
There were almost twice as many contributions to Republican candidates from “authorized committees” than there were to Democratic candidates. Meanwhile nearly all of the contributions to third-party candidates were from individuals.
In the multivariate plots section I’ll take a closer look at whether there are any relationships between form type and contribution amount at different times in the election.
One relationship I observed is between date and the number of contributions to major-party candidates. In particular Trump’s contributions from Georgia spiked considerably around July 2016, the month of both the Republican and Democratic National Conventions.
Another relationship I observed is between party and contribution amount. Republican candidates received considerably higher contribution dollar amounts on average versus Democratic candidates. I thought this was particularly notable given the high number of Republican candidates who were in the running early in the election.
There were almost twice many “authorized committee” contributions to Republican candidates than there were to Democratic candidates.
Additionally, all but one contribution to third-party nominees were made by individuals and not by committees.
I also realized that, due to the wide range of potentially correct answers (as of the time of a contribution) an individual contributor could specify different demographic answers each time and it’d still be good data, even if it’s hard to organize by individual donor.
The strongest relationship I came across is the relationship between candidate party and contribution amount. For Trump and all the other Republican candidates, the mean and median contribution levels were much higher than they were for Democratic candidates, despite the glut of Republican candidates early on in the elections.
Expanding on my bivariate explorations, I’d like to look at how the individual candidates compared to each other as far as the contributions they’ve received over time.
I’ll look at Trump’s contribution amounts over time at first, starting with a scatterplot that shows all the contributions and refunds, including all the outliers, alongside the previous graph.
This is kind of ugly and overplotted, but it at least initially looks quite similar to the first graph I made of contributions over time. I’d like to compare this distribution with the plot of all contributions. In light of the earliest Trump data going back as far as late 2014, I’ll put this new plot and the original plot on the same scale to get a sense of the change over time.
The set of contributions to all candidates goes back as far as the very tail end of 2014 and the first contributions to Trump’s primary campaign start showing up in June 2015, and take a massive uptick shortly before July 2016. As mentioned earlier, this is when the Republican and Democratic Conventions were held.
Having these two graphs on the same y-scale and the same x-axis length reaffirms that a lot of the highest and lowest contribution amounts in the dataset are accounted for in the contributions to Trump.
Let’s look at the contributions over time to Clinton’s campaign and then compare them with the two scatterplots we’ve been looking at.
Once again there’s significant overplotting here, and, due to its variance, this data also shows up on a different y-scale than the one we saw looking at the state’s contributions to Trump’s campaign. The earlier summaries showed that all of the reported individual contributions to Clinton’s campaign fall between -2,700 and 2,700 dollars. The vast majority of contributions appear to be in the 1- to 500-dollar range.
Before zooming in to take a closer look, I’ll take a quick glance at how the previous two scatterplots compare with that of Clinton when all three are plotted on the same y-scale, positive and negative outliers included.
The contributions to Clinton are more overplotted than those received by Trump and are visibly higher in sheer quantity and in consistency of occurrence over time, even if the variance is not as high as that of Trump. This graph certainly makes it look like Clinton’s contributions comprise a big percentage of the overall contributions, but the overplotting makes it hard to tell what’s really going on.
I’m going to put all three of these plots on a y-scale that covers the range of contribution amounts and refunds that’s visible in Clinton’s scatterplot. I’ll also add some transparency and noise in order to be able to look at them in a way that’s useful.
It looks like both Clinton and Trump were drawing in a significant amount of contributions in the range of 100 to 300 dollars during the months between the party conventions and the election, but Clinton had a more consistent amount of contributions in the 1- to 100-dollar range up until the election.
After Trump won, a small amount of contributions came in before 2016’s end, many of which were around the 100-dollar amount.
In light of my discovery that the vast majority of contributions were 500 dollars or less, I’d like to look at a scatterplot of contributions in the 1- to 500-dollar range, and impose a geom_line over the scatterplot showing median contribution levels, using a different color for each group.
These median lines are really quite noisy so I’ll also try looking at this data using geom_smooth to plot conditional means.
It’d take some careful parsing to attribute particular events to the various peaks and dips in these graphs, but there are some noticeable trends going on here. It looks like although Trump got a significant amount of contributions from Georgia around the Republican National Convention in July, the number of contributions he received later on was for the most part a trickle compared to the amount of contributions coming in for Clinton.
However, the median and median contribution amounts are, for much of election season, much higher for Trump than they were for Clinton. He may not have had the same number of supporters in Georgia as Clinton did, but the ones who pledged money to his campaign gave bigger sums of money on average. There is a significant jump up in mean contribution amount for Trump in the couple of months after his nomination.
It’s notable that the time periods in which there are a high number of contributions coincide with a lower median contribution amount. This seems to make sense in light of the fact that most people are unlikely to be able to give very high contributions to presidential candidates, even if they want to.
I’m going to zero in on the period of the general election - July through November of 2016. It’d be interesting to see what donations under $500 looked like over time during this period. In addition to a scatterplot showing all the contributions, as well as scatterplots of the contributions received by Clinton and Trump, I’ll make a fourth scatterplot showing contributions received by third-party candidates during the general election.
The bulk of the contributions received during the general election clearly went to Clinton, whose solid black streak under the $100 level through election day represents the most consistent intake over time. With the level of transparency here, the contributions to third-party candidates is negligible, with the darkest area showing during the month of July (and not a very dark area at that).
I do wonder about the ebb and flow of contributions to Trump in Georgia over the course of this timespan, particularly because he won the state despite being out-fundraised. Perhaps the presidential debates had some kind of effect on the number of contributions that came in. I’ll insert some vertical dotted lines representing the presidential debate dates - 9/26 (blue), 10/9 (purple), and 10/19 (red).
There appear to be very slight bumps up in contributions for both Clinton and Trump after the first debate on September 26th. This is noticeable in the streaks that show up at the 100-, 200-, 250-, and 400-dollar levels for Trump, and at the 100- and 250-dollar levels for Clinton in addition to her already sustained levels of contributions of 50 dollars or less.
It’s more difficult to notice any change after the second debate on October 9th. There may be a very slight bump to contributions to Trump at the 100-, 200-, and 250-dollar levels. The third debate seems to coincide with darkened sections of the scatterplot showing up at the 200-, 250-, and 400-dollar levels for Trump.
There were several dramatic news stories during the election that seemed to be turning points. I’m curious whether the tape that surfaced of Trump’s lewd conversation with Billy Bush (released 10/7), as well as FBI director James Comey’s letter to the U.S. Congress (released 10/28) had any noticeable effect on the number and amount of contributions coming in to the major candidates.
I’m having a harder time noticing any effects here - neither of these vertical lines representing the dates in question appear to correspond with any major changes in the scatterplot. There may be a very slight increase in contributions to Clinton after the Trump-Billy Bush tape, but it’s hard to tell. That’s not to say that these events didn’t have an effect on the election - the Comey letter in particular is widely argued to have cost Clinton the election. Effects on the number of contributions and the amounts of these contributions are more difficult to pick out in these scatterplots.
The next set of multivariate plots I’d be interested in looking at are total contributions in dollars received each day by each candidate over the course of the primary and general elections in 2016.
What I’ll do first is use dplyr to select and reshape the data in order to get daily total contributions for every candidate in a R data.frame.
I’ll do a set of line graphs on one set of axes using ggplot2. The color of each line in the plot represents an individual candidate.
This map is colorful but also very noisy. It’s easy to see the pink line on the right end of the graph representing Trump, which has a huge spike in the middle and has a lot of noisy variance approaching the election. More colors are perceptible on the left side of the graph when there were still a lot of Republican candidates contending for the nomination.
It may be helpful to zoom in on this a bit. I’ll look at 2016 only and see if this imparts anything useful.
As colorful as this is, it’s an incredibly noisy graph and is kind of hard to glean information from.
Perhaps looking at daily contribution totals by party would be useful. First I’ll use dplyr to create a new data.frame containing daily contribution totals by party.
Next I’ll plot contribution totals by day in a line graph, with the line colors reflecting political party.
This is much easier to read than the graph showing every one of the candidates. It appears that the most successful fundraising parties day by day were the Democrats and the Republicans, with both of those parties intermittently passing up each other as far as daily total contributions in dollars. The variance in Trump’s daily totals is much higher than that of Clinton, but he still appeared to out-fundraise her on at least a few days during the election.
I’d also like to take a look at daily totals by candidate gender. I’ll use dplyr to make a data.frame that reflections daily totals by candidate gender.
It appears that for most of 2016 there were more high daily dollar-amounts going to male candidates than female candidates. There are just a few days in the first half of the year in which female candidates out-fundraise the male candidates. The fact that female candidates managed to be ahead at this stage, when there are so many male competitors, seems noteworthy.
I’d like to see if those turning points I looked at a bit earlier have any sort of discernible pattern atop the graph of contributions by party.
First I’ll add dashed vertical lines representing the debates to the graphs of daily totals by party.
These line graphs give a much different look at the potential effects of the presidential debates on campaign contributions from Georgia.
All three debates appear to mark points where Georgia’s contributions to the Democratic candidate (Clinton) dropped and contributions to the Republican candidate (Trump) increased. There’s all sorts of up-and-down activity before, in between, and after the debates, but all of a sudden it looks like the debates helped Trump and hurt Clinton when it came to Georgia contributions.
There doesn’t appear to be a whole lot of connection between the debates and the contributions hauled in by the third-party candidates. This would seem to make sense in light of the third-party candidates not being invited to take part in the televised debates. However, one of the main spikes in contributions to the Libertarian candidate, Gary Johnson, shows up shortly after the first debate. Perhaps right-leaning voters in the state took an increased interest in Johnson after the first debate.
Next I’ll turn to the aforementioned turning point events - the Billy Bush tape and James Comey’s letter to Congress. I’ll try laying lines representing those events over the graphs of daily totals by party.
There was a drop in contributions to both Trump and Clinton right after the release of the Billy Bush tape. Additionally, there appears to be a drop in contributions for Trump right after the James Comey letter while there’s a brief increase in contribution totals for Clinton. These outcomes are rather different than I expected - I thought that the Billy Bush tape would help Clinton and hurt only Trump, and thought that the Comey letter would help Trump and hurt only Clinton. It looks like, in this particular case, my expected outcomes were incorrect.
I am curious whether contribution amounts are affected by the type of election taking place, whether it’s the primary or general election. First I’ll use the by function to take a look at summaries of contribution amounts in each election.
## ga$election_tp: G2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 19.0 37.0 117.6 100.0 5400.0
## --------------------------------------------------------
## ga$election_tp: O2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 250.0 300.0 500.0 610.9 500.0 2700.0
## --------------------------------------------------------
## ga$election_tp: P2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.00 19.27 35.00 111.30 80.00 12500.00
## --------------------------------------------------------
## ga$election_tp: P2020
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2600 2600 2600 2600 2600 2600
The mean and median contribution amounts for primary elections (labeled as P2016 in the election_tp column) are only slightly lower than the mean and median for general elections (labeled as G2016 in the election_tp column). At first glance, it doesn’t appear that there’s a huge difference in overall contribution amounts between the 2016 primary and general elections.
I’d like to look more specifically at summaries of contribution amounts to the two major-party candidates in each election.
## trump$election_tp: G2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 28.0 80.0 206.2 250.0 5400.0
## --------------------------------------------------------
## trump$election_tp: O2016
## NULL
## --------------------------------------------------------
## trump$election_tp: P2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 28.0 40.0 123.1 80.0 12500.0
## --------------------------------------------------------
## trump$election_tp: P2020
## NULL
## clinton$election_tp: G2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.00 12.00 25.00 88.59 75.00 2700.00
## --------------------------------------------------------
## clinton$election_tp: O2016
## NULL
## --------------------------------------------------------
## clinton$election_tp: P2016
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.00 10.00 25.00 128.20 95.48 2700.00
## --------------------------------------------------------
## clinton$election_tp: P2020
## NULL
Trump’s median and mean contribution amounts nearly doubled once the general election was underway. Clinton’s median contribution amount, meanwhile, was exactly the same for the primary and general - 25 dollars - and the mean contribution amount went down during the general election from 128.2 dollars to 88.59 dollars.
In light of my earlier finding that around 450 transactions have an election_tp of “NA” I’ll take a look at a summary of contribution amounts labeled this way.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5375.0 80.0 240.0 284.9 400.0 2700.0
The variance of contribution amounts for transactions labeled “NA” is huge - there’s an 8,000-dollar difference between the biggest refund and the maximum contribution of 2,700 dollars. More noteworthy than these outlying values is the fact that the mean and median contribution amounts for contributions not associated with an election type is considerably higher than the overall mean and median contribution levels or the mean and median for either major-party candidate. Over 400 of these “NA” contributions went to Trump’s campaign.
Let’s look at a quick scatterplot of campaign contributions over time colored by the type of election. In order to keep the scatterplot from being overplotted I’ll set the transparency to 1/100 as I have with earlier scatterplots.
As the dates progress from left to right, we see that the colors change from turquoise, representing the primary election, to vermilion, representing the general election. The variance is not visibly different between these two elections, but the area between about 500 and 1,000 dollars becomes more filled in toward the end of 2016 as Election Day approaches in early November.
It’s not totally clear whether election type is strictly associated with date, especially considering the “other” 2020 category that that one contribution is labeled with. I’ll look at a summary of contribution date by election type.
## ga$election_tp: G2016
## Min. 1st Qu. Median Mean 3rd Qu.
## "2013-10-29" "2016-08-19" "2016-09-27" "2016-09-20" "2016-10-24"
## Max.
## "2016-12-31"
## --------------------------------------------------------
## ga$election_tp: O2016
## Min. 1st Qu. Median Mean 3rd Qu.
## "2016-11-23" "2016-11-23" "2016-11-24" "2016-11-24" "2016-11-24"
## Max.
## "2016-11-28"
## --------------------------------------------------------
## ga$election_tp: P2016
## Min. 1st Qu. Median Mean 3rd Qu.
## "2013-10-29" "2016-01-28" "2016-03-27" "2016-03-13" "2016-06-08"
## Max.
## "2016-11-06"
## --------------------------------------------------------
## ga$election_tp: P2020
## Min. 1st Qu. Median Mean 3rd Qu.
## "2016-01-27" "2016-01-27" "2016-01-27" "2016-01-27" "2016-01-27"
## Max.
## "2016-01-27"
General and primary election contributions start showing up as early as October 2013, it turns out. Even though it’s hard to see any real crossover in the scatterplot there are some general election contributions sprinkled in during the period before when the general election actually began.
I’d like to take a look at the data in the previous graph using a geom_smooth with a linear model method. Maybe smooth conditional means will be helpful for understanding contribution amounts over time as they relate to election type.
The smoothers are a helpful but slightly oversimplified representation of the contribution data over time. It’s obvious that there is high variance in general electioncontribution amounts in 2015, at which point many candidates are starting to announce their intentions of running in 2016. As the data enters 2016, the means have much less variance, and the mean contribution to the general election remains a bit higher on average than the mean contribution to the primary election. The mean and median contribution amounts vary from candidate to candidate but these smoothers give a sense of the gradual decrease in mean contribution amounts as nomination solidify and more and more people contribute to the various campaigns.
The approximately 450 contributions with election type “NA” have quite a bit of variance, and seem to increase in mean amount as Election Day approaches. Additionally, the handful of “other” 2016 contributions are represented by a short, nearly vertical green line that reflects the high mean contribution amount associated with those contributions.
I think what accounts for the high variance visible on the left side of the graph of general election contributions is the relatively small number of contributions coming in at that time. One or a few high contributions around this time could set the mean and median as high as 2,700 dollars if that’s the only contribution received that day.
As more people start getting engaged with the campaigns underway, the mean comes down. I think this may be because, as mentioned earlier, most people don’t have enough expendable income to pledge high amounts of money to presidential campaigns, even if they want to.
After loading in the dataset I added columns for candidate party and gender. I’d like to take a quick look at whether there are noticeable differences in contribution amounts based on gender and political party.
The fact that Trump’s mean and median contribution amounts both inside and outside of Atlanta are so much higher than Clinton’s leads me to guess that contributions to the Republican candidates have higher amounts on average than contributions to Democratic candidates.
First I’ll use the by function to get an idea of what the distributions look like.
## ga$cand_party: Democrat
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 10.0 25.0 86.2 50.0 5400.0
## --------------------------------------------------------
## ga$cand_party: Green
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -965.0 49.0 100.0 219.8 250.0 2700.0
## --------------------------------------------------------
## ga$cand_party: Independent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5 25 100 146 250 500
## --------------------------------------------------------
## ga$cand_party: Libertarian
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 30 100 224 250 2700
## --------------------------------------------------------
## ga$cand_party: Republican
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 25.0 50.0 158.4 100.0 12500.0
I’ll also look at a scatterplot with transparency added and zoomed in on contributions under 500 dollars.
Democrats have the lowest median and mean contribution amounts by far of all the parties represented in the data set. The median amount (25 dollars) is half that of the Republican median of 50 dollars, while the mean contribution amount for Republican candidates is 72 dollars higher than the mean contribution to Democrats.
As for third-party contributions, Green, Independent, and Libertarian candidates all have a median contribution level of 100. The Libertarian candidate, Gary Johnson, has the highest mean contribution amount at 224 dollars, followed closely by Jill Stein at 219.80.
I’d also like to look at how contribution amounts break down by gender. Once again I’ll run the by function and plot a scatterplot with a transparency of 1/100, zooming in on contributions of 500 dollars or less.
## ga$cand_gender: female
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.00 11.55 25.00 104.30 75.00 5400.00
## --------------------------------------------------------
## ga$cand_gender: male
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 25.0 40.0 122.5 100.0 12500.0
The median contribution to female candidates is 25 dollars while the median contribution to male candidates is 40 dollars. Mean contributions are higher for men as well: 122.5 dollars to men versus 104.3 dollars to women. Once again it’s not immediately apparent whether the lower mean and median for the female candidates has to do with Clinton’s high number of contributions, or if it is due rather to her party affiliation or some other cause.
I’m curious how contributions from voters in Atlanta compared to contributions from the rest of the state. There are 10 counties that are considered to be part of the metro Atlanta area, but I’m only going to deal with contributions from the City of Atlanta for the purposes of this investigation.
It’d be simplest to create subsets of the data by setting the city name equal to or not equal to ‘Atlanta’. However, in a quick glance at the data, it looks like there are some misspellings of the city names here and there. So what I’ll do is create a subset of Atlanta contributions using the contributor zip code column, contbr_zip.
I’ll use the dplyr package to do some subsetting and reshaping of the data. I’ll also filter the data to contributions to Trump and Clinton, convert the zip code column to numeric format, and truncate the entries in the zip code column to five digits.
The USPS website has a complete list of zip codes for the city of Atlanta. I’ll use this list to create a vector containing all of the Atlanta zip codes. I’ll also add a column containing “TRUE” or “FALSE” depending on whether the zip code associated with the contribution was in the City of Atlanta or not, and filter the dataframe to create two subsetted data frames containing contributions from Atlanta and contributions from outside of Atlanta.
Next I’ll create a set of two dataframes for Atlanta contributions to Clinton and Atlanta contributions to Trump, and run some statistical summaries on them.
Summary of contributions to Clinton from Atlanta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 19.0 37.0 158.6 100.0 2700.0
Count of contributions to Clinton from Atlanta:
## [1] 23968 5
Sum of contributions to Clinton from Atlanta:
## [1] 3802396
Summary of contributions to Trump from Atlanta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 40.0 100.0 336.3 250.0 5000.0
Count of contributions to Trump from Atlanta:
## [1] 2913 5
Sum of contributions to Trump from Atlanta:
## [1] 979781.7
Clinton received almost 24,000 contributions from Atlanta whereas Trump received only 2,913 contributions from Atlanta. However, the median and mean contribution amounts for Trump are considerably higher than those for Clinton:
I’d like to look at histograms of the number of contributions from Atlanta to the major-party candidates.
It’s a bit hard to see what’s going on in this distribution so I’m going to apply a log10 scale and see if that helps with understanding it.
Here we have two normal-ish distributions for Clinton and Trump. These distributions reflect the fact that Clinton received way more individual contributions than Trump did in Atlanta. It’s a bit harder to tell that the mean and median contribution amounts were greater for Trump but we can see that his distribution is a bit further to the right on the x-axis than Clinton’s is.
Next let’s look at a boxplot of contributions from Atlanta to both major-party candidates.
Without zooming in, we can see that there is more variance in receipt amounts for Trump than there is for Clinton, whose contributions remain in the -2,700- to 2,700-dollar range. Let’s zoom in using coord_cartesian to get a closer look at the interquartile range without removing any values from the data. I’ll also add a red “x” using stat_summary marking the mean contribution amount for each candidate.
This zoomed-in version makes it clear that most of Clinton’s contributions from Atlanta were between 25 dollars and 100 dollars each while most of the contributions to Trump were much higher, between 50 and 250 dollars each. The median and mean contribution amounts are visibly much higher for Trump than for Clinton, but the thickness of the Clinton boxplot’s whiskers reflects the higher quantity of contributions she received.
Now let’s take a look at contributions from outside Atlanta.
Summary of contributions to Clinton from Georgia (outside Atlanta):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700 10 25 73 50 2700
Count of contributions to Clinton from Georgia (outside Atlanta):
## [1] 44419 5
Sum of contributions to Clinton from Georgia (outside Atlanta):
## [1] 3242700
Summary of contributions to Trump from Georgia (outside Atlanta):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.0 28.0 60.0 147.7 160.0 12500.0
Count of contributions to Trump from Georgia (outside Atlanta):
## [1] 26777 5
Sum of contributions to Trump from Georgia (outside Atlanta):
## [1] 3955830
The contrasts between contribution amounts received have stark contrasts outside of Atlanta. The mean and median contribution amounts for Trump are twice those of Clinton. While Clinton received 44,419 contributions outside Atlanta, Trump received 26,777 contributions and out-fundraised her by over 700,000 dollars. I’ll look at some visual representations of this.
This first stab at a histogram of non-Atlanta contributions is a bit hard to read so I’ll do a log10 transformation of the x-axis scale as I did with the distributions of the Atlanta contributions.
Compared to the distribution of his Atlanta contributions, Trump has a more robust normal distribution of individual contributions received from outside of Atlanta. The y-scale of the count of contributions received is much higher as well for both distributions. It’d be helpful to look at these contributions all on the same scale, so I’ll save the plots I previously made into variables and use the grid.arrange function from the gridExtra package to look at them all at once.
Looking at everything at once, it’s apparent that Trump did way better outside of Atlanta than he did inside Atlanta. This seems to fit with what I’ve heard about Atlanta skewing much more liberal than conservative compared to the rest of the state.
Let’s also look at boxplots of the non-Atlanta contributions for comparison.
This initial boxplot reflects the high variance of contributions to Trump from outside of Atlanta, with amounts ranging from -5,900 to 12,500. Let’s zoom in on the interquartile ranges and get a better look at where most of the distribution’s values lie.
As with contributions from Atlanta, the mean and median contribution amounts for Trump are much higher than those of Clinton when it comes to contributions in the rest of the state. There is also considerably more variance in contribution amounts for Trump than for Clinton.
I’d like to look at all the boxplots at once and see if that provides any further insight.
It’s notable that the shapes of the boxplots have similar size ratios for both Atlanta and non-Atlanta contributions. The mean and median contributions to both candidates are higher in Atlanta than they are statewide. I’m not sure if this means that Atlanta residents are more affluent on average than people outside the city, or if it’s the larger number of contributions from outside Atlanta (approximately 70,000 versus the approximately 26,000 from Atlanta) that brings down the mean and median.
I’m curious whether there are any associations between the form type and location in terms of Atlanta vs. non-Atlanta contributions.
## ga_form_data$atl: FALSE
## SA17A SA18 SB28A
## 86643 29776 580
## --------------------------------------------------------
## ga_form_data$atl: TRUE
## SA17A SA18 SB28A
## 29641 6878 467
There were almost three times the number of campaign contributors from outside Atlanta than from inside Atlanta. Additionally there were more than four times as many “authorized committee” contributions from outside of Atlanta (29,776) than there were from Atlanta (6,878).
I’m wondering if there is any connection between form type and contribution amount.
As I saw in the univariate analysis, most of the contributions (116,284 out of 153,985) are classed as “Form 3P Line 17A”, meaning they are contributions from individuals, not committees. There are, however, 36,654 contributions designated as “Form 3P Line 17A” (transfers from authorized committees) and 1,047 transactions categorized as “Form 3P Line 28A (refunds to individuals).
I’ll look at some quick summaries and scatterplots to see if that tells me anything about a possible relationship between form type and contribution amount.
## ga$form_tp: SA17A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400 15 27 106 100 12500
## --------------------------------------------------------
## ga$form_tp: SA18
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 28.0 50.0 152.5 100.0 5000.0
## --------------------------------------------------------
## ga$form_tp: SB28A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5900.00 -200.00 -50.00 -304.50 -25.00 -0.05
Contributions categorized as being from authorized committees have higher mean and median contribution amounts - a mean of 152.5 dollars and a median of 50 dollars - than contributions categorized as being from individuals, which have a mean of 106 dollars and a median of 27 dollars.
I’ll look at contributions over time colored by form type. I’ll initially focus in on the area between 0 and 2700 dollars for the year 2016, and set the transparency low so I can get a better sense of what’s going on here.
It looks like the variance is mostly comparable for form types 17A and 18, and it’s no surprise that all the form 28A transactions are negative values as they are refunds.
Committee contributions are visibly a bit higher on average than individual “SA17A” contributions. Interestingly the committee contributions don’t really gear up in earnest until July 2016 when the party conventions are going on.
The overall variance in contribution amounts for committee contributions is not that much greater than that for regular individual contributions. Additionally the vast majority of the contributions appear to be less than 100 dollars.
I’ll zoom in just a bit closer to the area between 0 and 250 dollars and see if that adds anything to my understanding.
I can see here that most committee contributions tend to be at least 25 dollars and that most of the green points representing these contributions have, as I saw before, a higher average contribution amount.
Are there any details about the committees themselves to be found in the data? I’ll take a quick look at the first rows of filtered data.frame showing only committee contributions.
## cmte_id cand_id cand_nm contbr_nm
## 1 C00580100 P80001571 Trump, Donald J. ROPER, KEN
## 2 C00580100 P80001571 Trump, Donald J. SHARP, MICHAEL
## 3 C00575795 P00003392 Clinton, Hillary Rodham SANDIFER, JOE
## 4 C00580100 P80001571 Trump, Donald J. SHARP, PATRICK
## 5 C00580100 P80001571 Trump, Donald J. STANCIL, ANN
## 6 C00580100 P80001571 Trump, Donald J. STANCIL, RALPH MR.
## 7 C00580100 P80001571 Trump, Donald J. STANCIL, RALPH MR.
## 8 C00575795 P00003392 Clinton, Hillary Rodham EBERLE, ROXANNE
## 9 C00580100 P80001571 Trump, Donald J. SHAW, MICHAEL
## 10 C00580100 P80001571 Trump, Donald J. RAMSEY, CANDACE
## contbr_city contbr_st contbr_zip contbr_employer
## 1 KENNESAW GA 30144 INFORMATION REQUESTED
## 2 YATESVILLE GA 31097 RETIRED
## 3 TUCKER GA 300842510 INFORMATION REQUESTED
## 4 MABLETON GA 30126 WND
## 5 JULIETTE GA 31046 RETIRED
## 6 DAHLONEGA GA 30533 RETIRED
## 7 DAHLONEGA GA 30533 RETIRED
## 8 ATHENS GA 306061993 UNIVERSITY OF GEORGIA
## 9 FAIRBURN GA 30213 WYNDHAM HOTELS
## 10 SUWANEE GA 30024 INFORMATION REQUESTED
## contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
## 1 INFORMATION REQUESTED 99.79 2016-10-22 <NA>
## 2 RETIRED 69.14 2016-10-23 <NA>
## 3 INFORMATION REQUESTED 100.00 2016-04-06 <NA>
## 4 INTERN 65.92 2016-08-31 <NA>
## 5 RETIRED 51.79 2016-10-09 <NA>
## 6 RETIRED -40.00 2016-08-24 <NA>
## 7 RETIRED -80.00 2016-09-03 <NA>
## 8 PROFESSOR 138.55 2016-04-13 <NA>
## 9 TRANSPORTATION 79.59 2016-10-30 <NA>
## 10 INFORMATION REQUESTED 67.14 2016-11-03 <NA>
## memo_cd memo_text form_tp file_num tran_id election_tp
## 1 X <NA> SA18 1146165 SA18.101809 G2016
## 2 X <NA> SA18 1146165 SA18.144011 G2016
## 3 X * HILLARY VICTORY FUND SA18 1091718 C4691736 P2016
## 4 X <NA> SA18 1146165 SA18.149107 G2016
## 5 X <NA> SA18 1146165 SA18.168222 G2016
## 6 X <NA> SA18 1146165 SA18.196991 G2016
## 7 X <NA> SA18 1146165 SA18.203600 G2016
## 8 X * HILLARY VICTORY FUND SA18 1091718 C4706469 P2016
## 9 X <NA> SA18 1146165 SA18.131093 G2016
## 10 X <NA> SA18 1146165 SA18.147239 G2016
## cand_party cand_gender
## 1 Republican male
## 2 Republican male
## 3 Democrat female
## 4 Republican male
## 5 Republican male
## 6 Republican male
## 7 Republican male
## 8 Democrat female
## 9 Republican male
## 10 Republican male
Other than the form type these contributions don’t look terribly different from individual contributions - there’s an individual name, city, state, zip code, occupation, and employer associated with each of these contributions.
I observed once again that mean and median contributions to Trump and other Republican candidates were much higher than mean and median contributions to Democratic candidates. It’s also clear that, while Clinton had lower contribution amounts on average, she received way more individual contributions throughout the state than Trump did.
There also seemed to be relationships between location and both contribution amounts and candidate contributed to. Atlanta contributions are higher on average than contributions from outside of Atlanta. Additionally contributions from Atlanta appear much more likely to contribute to Clinton instead of Trump. While Clinton enjoyed a considerable amount of contributions from the rest of the state, it’s the rest of the state where Trump did most of his fundraising. Additionally, it appears that there is a relationship between form type and party as well as form type and location (i.e. in or outside of Atlanta).
Looking at daily contribution dollar amount totals turned out to be very useful. I created line graphs of daily contribution totals broken down by candidate party and candidate gender. These graphs, when overlaid with vertical lines reflecting dates of debates and of a few significant campaign events, provided different insights than when I looked at these events in relation to a scatterplot of contributions over time.
Contributions from authorized committees tended to have higher dollar amounts than contributions from individuals. Additionally, it looked like contributions from committees appeared to start coming in more rapidly once the general election was underway.
It did not appear that contribution amounts varied a great deal with election type. Some of the earliest contributions were very high but for the most part individual contributions were less than 100 dollars apiece.
When I looked at daily contribution totals by party and added vertical lines representing debates and turning points in the election, all of a sudden I was able to get a better sense of what the effects of those events may have been. This was not immediately apparent in the earlier parts of my explorations. Additionally, what these graphs with the events overlaid on them suggested is that the events had a different effect than I would have expected.
The difference between mean and median contribution amounts to primary and general elections was not immediately visible on a scatterplot or in summaries of the overall data. However, it turned out that Trump received much higher contribution amounts on average during the general election than he did during the primary election. Clinton’s average contribution amount went down once the general election began. The earlier the date of an election contribution the more variance there was observable in median contribution amount.
Contributions to male candidates were significantly higher in mean and median contribution amounts than were contributions to female candidates. However there were points during the primary election where female candidates had higher daily totals than did the male candidates, which is interesting in light of how many male candidates were still in the running at that point.
This plot shows how the number of individual contributions from Georgia to the two major-party candidates changed over the course of the election, after the primaries. During June, July, and August, there are several spikes in contributions to Trump. This is the time period in which the pool of Republican nominees shook down and the Republican National Convention took place. However, Clinton received a higher number of contributions day-to-day from August through Election Day. Once Election Day arrives, contributions to Clinton end but Trump receives a trickle of contributions through early December.
The reason I chose this plot is because it surprised me that Clinton received more individual contributions day-to-day for much of general election season. It also gives an idea of how volatile the day-to-day numbers of contributions are.
Most contributions from Georgia to the various presidential candidates in the 2016 election were under 500 dollars. Looking at a scatterplot of these contributions over time, with a transparency of 1/100, gives a sense of the ebb and flow of contributions from Georgia residents during the general election period.
Clinton visibly has a more sustained number of contributions coming in, particularly at 100 dollars or less per contribution. Trump, on the other hand, has hot and cold periods, with a less sustained intake but a visibly higher average contribution amount, as well as more contributions at the 250- and 400-dollar levels.
This plot, which happens to look at potential election turning points, was itself a turning point in my analysis process. I had been looking at individual contribution amounts in relation to time, and struggled to find any kind of connection between certain election events and the contribution amounts that were coming in. Calculating the daily contribution totals and plotting them as line graphs broken up by party gave a better sense of
After each presidential debate there is all sorts of dialogue in the media about who “won” each time. This is a difficult thing to quantify or assess with any kind of objectivity, but looking at short-term impacts on daily campaign fundraising at the state level might help give a regional picture of who “won.”
In this case, we can observe a couple of things. There’s a drop in contributions from Georgia to Clinton right after the first debate while the fundraising momentum that Trump had in Georgia prior to that debate continues to climb for at least a few days afterward. A similar pattern emerges with the second debate: Clinton’s daily contribution totals drop while Trump’s climb at that time continues for a few days afterward. The third debate marks the beginning of yet another drop in contributions from Georgia to Clinton, and while Trump’s contributions have a positive slope approaching that day, they decline sharply afterward, hitting nearly the same low that Clinton hits.
It’s also interesting to observe the small spikes in contributions to Gary Johnson, the Libertarian nominee, and Jill Stein, the Green nominee. These spikes don’t appear to have too much of a connection to the debates, but there is a small spike in Johnson’s contributions after the first debate. Perhaps some right-leaning viewers of the first debate didn’t connect with Trump and decided to contribute to the Libertarian candidate.
It turned out there were far more ways to break down state-level election contributions than I thought. Even after looking at this data from all the angles that I have, I feel like I’ve only really scratched the surface of what took place over the course of the election from a campaign finance perspective.
The data posed some difficulties that I wasn’t sure how to get around. One I’d be particularly keen to resolve is consolidating contributions to the individual level with the use of dplyr or another reshaping package. This seems almost impossible to do given that contributor names and city names can be misspelled. Additionally it’s entirely possible to give multiple contributions and provide correct demographic information as of a particular date that differs from demographic information provided on another date - a contributor may have moved cities or zip codes, started a new job or changed their last name after a marriage, or started a new job.
It’s also difficult to address the outliers in the contribution amounts column. What makes this difficult are the reassignments that take place for compliance with campaign finance regulations. There are also a wide variety of memos associated with these contributions, the significance of which I did not have the bandwidth to fully delve into.
As far as what went well, I found that I was able to create an interesting and aesthetically appealing (to me, anyway) range of visuals that showed many different ways of considering this data. Histograms, boxplots, frequency polygons, line graphs, and scatterplots all offered windows to this data that opened my eyes to what took place in this election.
I was surprised by a few things. Firstly I got the sense that both Clinton and Sanders did far better from a fundraising standpoint than I thought they would in light of the state’s recent voting history. Secondly the effects of the turning points I looked at - the presidential debates, the Billy Bush tape, and Comey’s letter to Congress - on daily contribution totals were not what I would have expected.
There are future analyses I’d like to do with this dataset down the road. I’d like to analyze contributors from Atlanta suburbs versus the city itself, and also look at these two areas compared to the rest of the state. It’d also be interesting to look at a wider range of turning points and attempt to quantify the effect that these turning points did or did not have on the contribution amounts and totals.